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Abstract — Explicit characterization and computation of the 
multi-source network coding capacity region (or even bounds) 
is long standing open problem. In fact, finding the capacity 
region requires determination of the set of all entropic vectors 
T*, which is known to be an extremely hard problem. On the 
other hand, calculating the explicitly known linear programming 
bound is very hard in practice due to an exponential growth 
in complexity as a function of network size. We give a new, 
easily computable outer bound, based on characterization of 
all functional dependencies in networks. We also show that the 
proposed bound is tighter than some known bounds. 

I. Introduction 

The network coding approach introduced in [1], [2] general- 
izes routing by allowing intermediate nodes to forward packets 
that are coded combinations of all received data packets. This 
yields many benefits that are by now well documented in 
the literature [3]-[6]. One fundamental open problem is to 
characterize the capacity region and the classes of codes that 
achieve capacity. The single session multicast problem is well 
understood. In this case, the capacity region is characterized by 
max-flow/min-cut bounds and linear network codes maximize 
throughput [2]. 

Significant complications arise in more general scenarios, 
involving more than one session. Linear network codes are not 
sufficient for the multi-source problem [7], [8]. Furthermore, 
a computable characterization of the capacity region is still 
unknown. One approach is to bound the capacity region by the 
intersection of a set of hyperplanes (specified by the network 
topology and sink demands) and the set of entropy functions 
T* (inner bound), or its closure f* (outer bound) [3], [9], [10]. 
An exact expression for the capacity region does exist, again 
in terms of T* [11]. Unfortunately, this expression, or even 
the bounds [3], [9], [10] cannot be computed in practice, due 
to the lack of an explicit characterization of the set of entropy 
functions for more than three random variables. In fact, it is 
now known that T* cannot be described as the intersection 
of finitely many half-spaces [12]. The difficulties arising from 
the structure of T* are not simply an artifact of the way the 
capacity region and bounds are written. In fact it has been 
shown that the problem of determining the capacity region 
for multi-source network coding is completely equivalent to 
characterization of T* [8]. 

One way to resolve this difficulty is via relaxation of the 
bound, replacing the set of entropy functions with the set 
of polymatroids T (which has a finite characterization). In 



practice however, the number of variables and constraints 
increase exponentially with the number of links in the network, 
and this prevents practical computation for any meaningful 
case of interest. 

In this paper, we provide an easily computable relaxation of 
the LP bound. The main idea is to find sets of edges which are 
determined by the source constraints and sink demands such 
that the total capacity of these sets bounds the total throughput. 
The resulting bound is tighter than the network sharing bound 
[13] and the bounds based on information dominance [14]. 

Section II provides some background on pseudo-variables 
and pseudo entropy functions (which generalize entropy func- 
tions) [8]. These pseudo variables are used to describe a family 
of linear programming bounds on the capacity region for 
network coding. In Section III we give an abstract definition 
of a functional dependence graph, which expresses a set of 
local dependencies between pseudo variables (in fact a set 
of constraints on the pseudo entropy). Our definition extends 
that introduced by Kramer [15] to accommodate cycles. This 
section also provides the main technical ingredients for our 
new bound. In particular, we describe a test for functional 
dependence, and give a basic result relating local and global 
dependence. The main result is presented in Section IV. 

Notation: Sets will be denoted with calligraphic typeface, 
e.g. X. Set complement is denoted by the superscript X c 
(where the universal set will be clear from context). Set 
subscripts identify the set of objects indexed by the subscript: 
X A = {X a , a e A}. The power set 2 X = {A, A C X} is the 
collection of all subsets of X. Where no confusion will arise, 
set union will be denoted by juxtaposition, A U B = AB, and 
singletons will be written without braces. 

II. Background 

A. Pseudo Variables 

We give a brief revision of the concept of pseudo-variables, 
introduced in [8]. Let M — {1, 2, ... , N} be a finite set, and 
let Xfj — {X\, X2, . . • , Xn} be a ground set associated with 
a real-valued function g : 2 XjV 1— > M. defined on subsets of Xtf, 
with g{$) = 0. We refer to the elements of Xj^ as pseudo- 
variables and the function g as a pseudo-entropy function. 
Pseudo-variables and pseudo-entropy generalize the familiar 
concepts of random variables and entropy. Pseudo-variables 
do not necessarily take values, and there may be no associated 
joint probability distribution. A pseudo-entropy function may 



assign values to subsets of Xj^ in a way that is not consistent 
with any distribution on a set of N random variables. A 
pseudo-entropy function g can be viewed as a point in a 2 N 
dimensional Euclidean space, where each coordinate of the 
space is indexed by a subset of Xj^. 

A function g is called polymatroidal if it satisfies the 
polymatroid axioms. 

ff(0) = (1) 
<j(X A ) > <)(Xb), if B C A non-decreasing 

(2) 

g(X A ) + g(X B ) > g(X AuB ) + g(X AnB ) submodular (3) 

It is called Ingletonian if it satisfies Ingleton's inequalities 
(note that Ingletonian g are also polymatroids) [16]. A function 
g is entropic if it corresponds to a valid assignment of joint 
entropies on N random variables, i.e. there exists a joint 
distribution on N discrete finite random variables Y\,...,Y N 
with g{X A ) = H(Y A ),A C Af. Finally, g is almost entropic 
if there exists a sequence of entropic functions such that 
limbec = g. Let r* C f * C T In C T respectively 
denote the sets of all entropic, almost entropic, Ingletonian 
and polymatroidal, functions. 

Both T and T In are polyhedra. They can be expressed as 
the intersection of a finite number of half-spaces in M. N . 
In particular, every g <G T satisfies, (l)-(3), which can be 
expressed minimally in terms of 

linear inequalities involving 2 N — 1 variables [9]. Each g £ T Ia 
satisfies an additional 

V _ 5 N + _ + l 2 N 

4 2 4 

linear inequalities [16]. 

Definition 1: Let A, B C X be subsets of a set of pseudo- 
variables X with pseudo-entropy g. Define 1 

g(B\A)±g(AB)-g(A). (4) 

A pseudo-variable X e X is said to be a function of a set of 
pseudo-variables A C X if g (X \ A) = 0. 

Definition 2: Two subsets of pseudo-variables A and B are 
called independent if g (AB) = g(A) + g(B), denoted by 
ALB. 

B. Network Coding and Capacity Bounds 

Let the directed acyclic graph Q = (V, £) serve as a 
simplified model of a communication network with error- 
free point-to-point communication links. Edges e e £ have 
capacity C e > 0. For edges e, / G £, write / — > e as shorthand 
for head(/) = tail(e). Similarly, for an edge / e £ and a node 
it G V, the notations / — > u and w — > / respectively denote 
head(/) = u and tail(/) = u. 

'Note that this yields the chain rule for pseudo-entropies to be true by 
definition. 



Let S be an index set for a number of multicast sessions, 
and let {Y s : s e S} be the set of source variables. These 
sources are available at the nodes identified by the mapping 
o:5hV. Each source may be demanded by multiple sink 
nodes, identified by the mapping b : S ^ 2 V . Each edge 
e e £ carries a variable U e which is a function of incident 
edge variables and source variables. 

Definition 3: Given a network Q = (V,£), with sessions 
S, source locations a and sink demands b, and a subset of 
pseudo-entropy functions A C M 2 on pseudo-variables 

Y s , U £ , let TZ(A) = {(R s ,s e S)} be the set of source rate 
tuples for which there exists a g e A satisfying 

±ses Y s (d) 

g (U e | {Y s : a(«) - e}, {U f : / -» e}) = 0, e e f (C 2 ) 

<7(r s | J7 e : u) =0,u G 6(a) (C 3 ) 

#e)<C e ,e€f (C 4 ) 

g(Y s ) >R s ,seS 

It is known that 7^(r*) and 7^(r*) are inner and outer bounds 
for the set of achievable rates (i.e. rates for which there exist 
network codes with arbitrarily small probability of decoding 
error). 

It is known that 1Z(T) is an outer bound for the set of 
achievable rates [9]. Similarly, lZ(T ln ) is an outer bound for the 
set of rates achievable with linear network codes [8]. Clearly 
ft(r In ) C K(T). The sum-rate bounds induced by "K(r In ) and 
TZ(T) can in principle be computed using linear programming, 
since they may be reformulated as 

max^^(/(y s ) subject to 

sgs (5) 

g e Ci n c 2 n c 3 n c 4 n A 

where A is either T or T In , and C\ , C 2 , C 3 , C4 are the subsets of 
pseudo-entropy functions satisfying the so-labeled constraints 
above. Clearly the constraint set C\ C\Ci^C-j,f\C^C\ A is linear. 

One practical difficulty with computation of (5) is the 
number of variables and the number of constraints due to T (or 
T In ), both of which increase exponentially with \£\. The aim 
of this paper is to find a simpler outer bound. One approach 
is to use the functional dependence structure induced by the 
network topology to eliminate variables or constraints from 
T [17]. Here we will take a related approach, that directly 
delivers an easily computable bound. 

III. Functional Dependence Graphs 

Definition 4 (Functional Dependence Graph): Let X = 
{Xi, . . . ,X N } be a set of pseudo-variables with pseudo- 
entropy function g. A directed graph Q = (V, £) with |V| = 
is called a functional dependence graph for X if and only if 
for all i = 1,2,..., N 

g(X i \{X j :(j,i)e£})=0. (6) 
With an identification of Xi and node i e V, this Definition 
requires that each pseudo-variable A, is a function (in the 



sense of Definition 1) of the pseudo-variables associated with 
its parent nodes. To this end, define 

n(i) = {jeV:(j,i)e£}. 

Where it does not cause confusion, we will abuse notation and 
identify pseudo-variables and nodes in the FDG, e.g. (6) will 
be written g (i \ n(i)) = 0. 

Definition 4 is more general than the functional dependence 
graph of [15, Chapter 2]. Firstly, in our definition there is 
no distinction between source and non-source random vari- 
ables. The graph simply characterizes functional dependence 
between variables. In fact, our definition admits cyclic directed 
graphs, and there may be no nodes with in-degree zero 
(which are source nodes in [15]). We also do not require 
independence between sources (when they exist), which is 
implied by the acyclic constraint in [15]. Our definition of 
an FDG admits pseduo-entropy functions g with additional 
functional dependence relationships that are not represented by 
the graph. It only specifies a certain set of conditional pseudo- 
entropies which must be zero. Finally, our definition holds for 
a wide class of objects, namely pseudo-variables, rather than 
just random variables. 

Clearly a functional dependence graph in the sense of [15] 
satisfies the conditions of Definition 4, but the converse is not 
true. Henceforth when we refer to a functional dependence 
graph (FDG), we mean in the sense of Definition 4. Further- 
more, an FDG is acyclic if Q has no directed cycles. A graph 
will be called cyclic if every node is a member of a directed 
cycle. 2 

Definition 4 specifies an FDG in terms of local dependence 
structure. Given such local dependence constraints, it is of 
great interest to determine all implied functional dependence 
relations. In other words, we wish to find all sets A and B 
such that g(AB) = g(A). 

Definition 5: For disjoint sets A, B C V we say A deter- 
mines B in the directed graph Q = (V, £), denoted A —* B, 
if there are no elements of B remaining after the following 
procedure: 

Remove all edges outgoing from nodes in A and subse- 
quently remove all nodes and edges with no incoming edges 
and nodes respectively. 

For a given set A, let <fr(A) C V be the set of nodes deleted 
by the procedure of Definition 5. Clearly <p(A) is the largest 
set of nodes with A — > <f>(A). 

Lemma 1 (Grandparent lemma): Let Q — (V, £) be a func- 
tional dependence graph for a polymatroidal pseudo-entropy 
function g S T. For any j E V with i G 7r(j) ^ 

g(j | 7r(i),7r(j) \i) = 0. 
Proof: By hypothesis, g(k \ ir(k)) = for any ft e V. 
Furthermore, note that for any g e V, conditioning cannot 
increase pseudo-entropy 3 and hence g(k | ir(k),A) = for 

2 In this paper we do not consider graphs that are neither cyclic or acyclic. 
3 This is a direct consequence of submodularity, (3). 



any AC.V. Now using this property, and the chain rule 

o = g(j 1 7r(j)) 

= 9(3 I 7I"(i),71"W) 

= ff(i,7r(i).7r(i)) - 9{n{j),Tr(i)) 
= g(j,n(j) \ - g{n(j),n(i)) 

= g(j, \ i, 7r(i)) - g(n(j) \ i, 
= g(j I ?r(*), 7r(i) \ «)• 

■ 

We emphasize that in the proof of Lemma 1 we have only 
used the submodular property of polymatroids, together with 
the hypothesized local dependence structure specified by the 
FDG. 

Clearly the lemma is recursive in nature. For example, it is 
valid for g(j \ n(j) \ i,ir(i) \ k,ir(k)) — and so on. The 
implication of the lemma is that a pseudo-variable Xj in an 
FDG is a function of X^ for any A C V with A^> j. 

Theorem 1: Let Q be a functional dependence graph on 
the pseudo-variables X with polymatroidal pseudo-entropy 
function g. Then for disjoint subsets A, B C V, 

A -» B => g(B I A) = 0. 
Proof: Let A -» B in the FDG Q. Then, by Definition 
5 there must exist directed paths from some nodes in A to 
a every node in B, and there must not exist any directed 
path intersecting B that does not also intersect A. Recursively 
invoking Lemma 1, the theorem is proved. ■ 

Definition 5 describes an efficient graphical procedure to 
find implied functional dependencies for pseudo-variables with 
local dependence specified by a functional dependence graph 
Q. It captures the essence of the chain rule (4) for pseudo- 
entropies and the fact that pseudo-entropy is non-increasing 
with respect to conditioning (3), which are the main arguments 
necessary for manual proof of functional dependencies. 

One application of Definition 5 is to find a reduction of a 
given set C, i.e. to find a disjoint partition of C into A and B 
with A —> B, which implies g(C) = g{AB) = g(A). On the 
other hand, it also tells which sets are irreducible. 

Definition 6 (Irreducible set): A set of nodes B in a func- 
tional dependence graph is irreducible if there is no A C B 
with A—>B. 

Clearly, every singleton is irreducible. In addition, in an 
acyclic FDG, irreducible sets are basic entropy sets in the 
sense of [17]. In fact, irreducible sets generalize the idea of 
basic entropy sets to the more general (and possibly cyclic) 
functional dependence graphs on pseudo-variables. 

A. Acyclic Graphs 

In an acyclic graph, let An(^4) denote the set of ancestral 
nodes, i.e. for every node a e An(A), there is a directed path 
from a to some b e A. 

Of particular interest are the maximal irreducible sets: 
Definition 7: An irreducible set A is maximal in an acyclic 
FDG g = (V,£) ttV\<t>(A)\An.(A) = (V\<f>(A))\An.(A) = 
0, and no proper subset of A has the same property. 



Note that for acyclic graphs, every subset of a maximal 
irreducible set is irreducible. Conversely, every irreducible set 
is a subset of some maximal irreducible set [17]. Irreducible 
sets can be augmented in the following way. 

Lemma 2 (Augmentation): Let A C V in an acyclic FDG 
Q = (V,£). Let B = V \ <j>{A) \ Atl(A). Then A U {b} is 
irreducible for every b e B. 

This suggests a process of recursive augmentation to find 
all maximal irreducible sets in an acyclic FDG (a similar 
process of augmentation was used in [17]). Let 5 be a 
topologically sorted 4 acyclic functional dependence graph 
Q = ({0,1, 2,...}, £). Its maximal irreducible sets can be 
found recursively via AllMaxSetsA(Cf , {}) in Algorithm 1. In 
fact, AllMaxSetsA(£,„4) finds all maximal irreducible sets 
containing A. 

Algorithm 1 AUMaxSetsA(£,.4) 
Require: G = (V,£),AcV 
B <- V \ <t>{A) \ An(A) 
if B ^ then 

Output {AllMaxSetsA(g,^U {b}) : b e B} 
else 

Output A 
end if 



Algorithm 2 AUMaxSetsC(a,.4) 



Require: Q = (V,£),AcV 
if v <£ 4> (A c \ {v}) ,\fveA c then 

Output A c 
else 

for all v e A c do 

if v e <j> (A c \ {v}) then 

Output AllMaxSetsC(a,.4 U {v}) 
end if 
end for 
end if 




Fig. 1. The butterfly network. 



B. Cyclic Graphs 

In cyclic graphs, the notion of a maximal irreducible set is 
modified as follows: 

Definition 8: An irreducible set A is maximal in a cyclic 
FDG Q = (V, £ ) if V \ <f>(A) = 0, and no proper subset of A 
has the same property. 

For cyclic graphs, every subset of a maximal irreducible set is 
irreducible. In contrast to acyclic graphs, the converse is not 
true. In fact there can be irreducible sets that are not maximal, 
and are not subsets of any maximal irreducible set. It is easy 
to show that 

Lemma 3: All maximal irreducible sets have the same 
pseudo-entropy. 

This fact will be used in development of our capacity bound 
for network coding in Section IV below. We are interested in 
finding every maximal irreducible set for cyclic graphs. This 
may be accomplished recursively via AUMaxSetsC(C/, {}) 
in Algorithm 2. Note that in contrast to Algorithm 1, 
AllMaxSetsC(£,^4) finds all maximal irreducible sets that do 
not contain any node in A. 

Example 1 (Butterfly network): Figure 1 shows the well- 
known butterfly network and Figure 2 shows the corresponding 
functional dependence graph. Nodes are labeled with node 
numbers and pseudo-variables (The sources variables are Y\ 
and y 2 - The Ui are the edge variables, carried on links 
with capacity C t ). Edges in the FDG represent the functional 
dependency due to encoding and decoding requirements. 

4 I.e. Order nodes such that if there is a directed edge from node % to j then 
i < j [9, Proposition 11.5]. 



The maximal irreducible sets of the cyclic FDG shown in 
Figure 2 are 

{1, 2}, {1, 5}, {1,7}, {1,8}, {2, 4}, {2, 7}, {2, 9}, {3, 4, 5}, 
{3, 4, 8}, {3, 7}, {3, 8, 9}, {4, 5, 6}, {5, 6, 9}, {6, 7}, {6, 8, 9}. 

IV. Functional Dependence Bound 

We now give an easily computable outer bound for the total 
capacity of a network coding system. 

Theorem 2 (Functional Dependence Bound): Let 
Ci 7^2, £3,^4 be given network coding constraint sets. 
Let Q = (V,£) be a functional dependence graph 5 on the 

5 This FDG will be cyclic due to the sink demands C3 




Fig. 2. FDG of the butterfly network. 



(source and edge) pseudo-variables Y s , Us with pseudo- 
entropy function g e Ci H C 2 H C 3 n C 4 n T. Let B M be 
the collection of all maximal irreducible sets not containing 
source variables. Then 



EffW^x E 



Cl 

Lemma 3 
Subadditivity of j e T 



s£5 

Proo/; Let S G £> M , then 

5>(f s )= 5 (y s ) 

=g(U e :U e eB) 

< E 

U e £B 

< E ^ 

e:(7 c eB 



Maximal irreducible sets which do not contain source variables 
are "information blockers" from sources to corresponding 
sinks. They can be interpreted as information theoretic cuts in 
in the network. Note that an improved bound can in principle 
be obtained by using additional properties of T (rather than 
just subadditivity). Similarly, bounds for linear network codes 
could be obtained by using T ln . 

Corollary 1: For single source multicast networks, Theo- 
rem 2 becomes the max-flow bound [9, Theorem 11.3] and 
hence is tight. 

Example 2 (Butterfly network): The functional dependence 
bound for the butterfly network of Figure 2 is 

R! + R 2 < mm{C 3 + C 7 , C 6 + C 7 , C 3 + C 4 + C 5 , 

C3 + C4 + Cg, C3 + Cg + C9, C4 + C5 + Cq, 

C5 + Ca + Co, , Cq + Cs + C9 } . 
To the best of our knowledge, Theorem 2 is the tightest 
bound expression for general multi-source multi-sink network 
coding (apart from the computationally infeasible LP bound). 
Other bounds like the network sharing bound [13] and bounds 
based on information dominance [14] use certain functional 
dependencies as their main ingredient. In contrast, Theorem 2 
uses all the functional dependencies due to network encoding 
and decoding constraints. 

V. Conclusion 

Explicit characterization and computation of the multi- 
source network coding capacity region requires determination 
of the set of all entropic vectors T*, which is known to be an 
extremely hard problem. The best known outer bound can in 
principle be computed using a linear programming approach. 
In practice this is infeasible due to an exponential growth in 
the number of constraints and variables with the network size. 

We gave an abstract definition of a functional dependence 
graph, which extends previous notions to accommodate not 
only cyclic graphs, but more abstract notions of dependence. In 
particular we considered polymatroidal pseudo-entropy func- 
tions, and demonstrated an efficient and systematic method 



to find all functional dependencies implied by the given local 
dependencies. 

This led to our main result, which was a new, easily 
computable outer bound, based on characterization of all 
functional dependencies in networks. We also show that the 
proposed bound is tighter than some known bounds. 
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